241 research outputs found
An Exploratory Study of Field Failures
Field failures, that is, failures caused by faults that escape the testing
phase leading to failures in the field, are unavoidable. Improving verification
and validation activities before deployment can identify and timely remove many
but not all faults, and users may still experience a number of annoying
problems while using their software systems. This paper investigates the nature
of field failures, to understand to what extent further improving in-house
verification and validation activities can reduce the number of failures in the
field, and frames the need of new approaches that operate in the field. We
report the results of the analysis of the bug reports of five applications
belonging to three different ecosystems, propose a taxonomy of field failures,
and discuss the reasons why failures belonging to the identified classes cannot
be detected at design time but shall be addressed at runtime. We observe that
many faults (70%) are intrinsically hard to detect at design-time
Automated, Cost-effective, and Update-driven App Testing
Apps' pervasive role in our society led to the definition of test automation
approaches to ensure their dependability. However, state-of-the-art approaches
tend to generate large numbers of test inputs and are unlikely to achieve more
than 50% method coverage. In this paper, we propose a strategy to achieve
significantly higher coverage of the code affected by updates with a much
smaller number of test inputs, thus alleviating the test oracle problem. More
specifically, we present ATUA, a model-based approach that synthesizes App
models with static analysis, integrates a dynamically-refined state abstraction
function and combines complementary testing strategies, including (1) coverage
of the model structure, (2) coverage of the App code, (3) random exploration,
and (4) coverage of dependencies identified through information retrieval. Its
model-based strategy enables ATUA to generate a small set of inputs that
exercise only the code affected by the updates. In turn, this makes common test
oracle solutions more cost-effective as they tend to involve human effort. A
large empirical evaluation, conducted with 72 App versions belonging to nine
popular Android Apps, has shown that ATUA is more effective and less effort
intensive than state-of-the-art approaches when testing App updates
Automatic Generation of Acceptance Test Cases from Use Case Specifications: an NLP-based Approach
Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since it is enforced by software standards, which mandate that each requirement be validated by such testing in a clearly traceable manner. Test engineers need to identify all the representative test execution scenarios from requirements, determine the runtime conditions that trigger these scenarios, and finally provide the input data that satisfy these conditions. Given that requirements specifications are typically large and often provided in natural language (e.g., use case specifications), the generation of acceptance test cases tends to be expensive and error-prone. In this paper, we present Use Case Modeling for System-level, Acceptance Tests Generation (UMTG), an approach that supports the generation of executable, system-level, acceptance test cases from requirements specifications in natural language, with the goal of reducing the manual effort required to generate test cases and ensuring requirements coverage. More specifically, UMTG automates the generation of acceptance test cases based on use case specifications and a domain model for the system under test, which are commonly produced in many development environments. Unlike existing approaches, it does not impose strong restrictions on the expressiveness of use case specifications. We rely on recent advances in natural language processing to automatically identify test scenarios and to generate formal constraints that capture conditions triggering the execution of the scenarios, thus enabling the generation of test data. In two industrial case studies, UMTG automatically and correctly translated 95% of the use case specification steps into formal constraints required for test data generation; furthermore, it generated test cases that exercise not only all the test scenarios manually implemented by experts, but also some critical scenarios not previously considered
TkT: Automatic Inference of Timed and Extended Pushdown Automata
To mitigate the cost of manually producing and maintaining models capturing software specifications, specification mining techniques can be exploited to automatically derive up-to-date models that faithfully represent the behavior of software systems. So far, specification mining solutions focused on extracting information about the functional behavior of the system, especially in the form of models that represent the ordering of the operations. Well-known examples are finite state models capturing the usage protocol of software interfaces and temporal rules specifying relations among system events. Although the functional behavior of a software system is a primary aspect of concern, there are several other non-functional characteristics that must be typically addressed jointly with the functional behavior of a software system. Efficiency is one of the most relevant characteristics. In fact, an application delivering the right functionalities inefficiently has a big chance to not satisfy the expectation of its users. Interestingly, the timing behavior is strongly dependent on the functional behavior of a software system. For instance, the timing of an operation depends on the functional complexity and size of the computation that is performed. Consequently, models that combine the functional and timing behaviors, as well as their dependencies, are extremely important to precisely reason on the behavior of software systems. In this paper, we address the challenge of generating models that capture both the functional and timing behavior of a software system from execution traces. The result is the Timed k-Tail (TkT) specification mining technique, which can mine finite state models that capture such an interplay: the functional behavior is represented by the possible order of the events accepted by the transitions, while the timing behavior is represented through clocks and clock constraints of different nature associated with transitions. Our empirical evaluation with several libraries and applications show that TkT can generate accurate models, capable of supporting the identification of timing anomalies due to overloaded environment and performance faults. Furthermore, our study shows that TkT outperforms state-of-the-art techniques in terms of scalability and accuracy of the mined models
Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems
When Deep Neural Networks (DNNs) are used in safety-critical systems,
engineers should determine the safety risks associated with DNN errors observed
during testing. For DNNs processing images, engineers visually inspect all
error-inducing images to determine common characteristics among them. Such
characteristics correspond to hazard-triggering events (e.g., low illumination)
that are essential inputs for safety analysis. Though informative, such
activity is expensive and error-prone.
To support such safety analysis practices, we propose SEDE, a technique that
generates readable descriptions for commonalities in error-inducing, real-world
images and improves the DNN through effective retraining. SEDE leverages the
availability of simulators, which are commonly used for cyber-physical systems.
SEDE relies on genetic algorithms to drive simulators towards the generation of
images that are similar to error-inducing, real-world images in the test set;
it then leverages rule learning algorithms to derive expressions that capture
commonalities in terms of simulator parameter values. The derived expressions
are then used to generate additional images to retrain and improve the DNN.
With DNNs performing in-car sensing tasks, SEDE successfully characterized
hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled
retraining to achieve significant improvements in DNN accuracy, up to 18
percentage points.Comment: 40 pages, 15 figures, 17 table
- …